Detecting Gender by Full Name: Experiments with the Russian Language
نویسندگان
چکیده
This paper describes a method that detects gender of a person by his/her full name. While some approaches were proposed for English language, little has been done so far for Russian. We fill this gap and present a large-scale experiment on a dataset of 100,000 Russian full names from Facebook. Our method is based on three types of features (word endings, character n-grams and dictionary of names) combined within a linear supervised model. Experiments show that the proposed simple and computationally efficient approach yields excellent results achieving accuracy up to 96%.
منابع مشابه
Gender Concept “Woman” in the Minds of the Russian People (Taking the Chinese as Reference) According to an Associative Experiment
The article is devoted to the study of language representations of the concept of “woman” in the minds of the Russian and Chinese people based on a comparison of associative experiments of two languages, identifying the dynamics of the concept in the language consciousness of the people, establishing the specificity of the concept in the Russian language picture of the world referring to the Ch...
متن کاملDictionary of Abstract and Concrete Words of the Russian Language: A Methodology for Creation and Application
The paper describes the first stage of a project on creating an electronic dictionary with numerical estimates of the degree of abstractness and concreteness of Russian words. Our approach is to integrate data obtained from several different sources: text corpora, psycholinguistic experiments, published dictionaries, markers of abstractness (certain suffixes) and a translation of a similar dict...
متن کاملAnalysis of Language Legislation of All 85 Russian Federation’s Subjects (Regions)
The analysis of the language legislation of all 85 subjects of the Russian Federation shows complete heterogeneity and diversity. Common legal guidelines in Federal law do not exist, because Federal legislation is obsolete and is largely whitespace and conflict. The subjects of the Russian Federation, on whose territory different ethnic groups, both large and indigenous, historically live, solv...
متن کاملA novel hybrid method for vocal fold pathology diagnosis based on russian language
In this paper, first, an initial feature vector for vocal fold pathology diagnosis is proposed. Then, for optimizing the initial feature vector, a genetic algorithm is proposed. Some experiments are carried out for evaluating and comparing the classification accuracies which are obtained by the use of the different classifiers (ensemble of decision tree, discriminant analysis and K-nearest neig...
متن کاملDiminutivization supports gender acquisition in Russian children.
Gender agreement elicitation was used with Russian children to examine how diminutives common in Russian child-directed speech affect gender learning. Forty-six children (2;9-4;8) were shown pictures of familiar and of novel animals and asked to describe them after hearing their names, which all contained regular morphophonological cues to masculine or feminine gender. Half were presented as si...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014